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Abstract 

Background: Constraint-based models enable structured cellular representations in which intracellular kinetics are 
circumvented. These models, combined with experimental data, are useful analytical tools to estimate the state 
exhibited (the phenotype) by the cells at given pseudo-steady conditions. 

Results: In this contribution, a simplified constraint-based stoichiometric model of the metabolism of the yeast 
Pichia pastoris, a workhorse for heterologous protein expression, is validated against several experimental available 
datasets. Firstly, maximum theoretical growth yields are calculated and compared to the experimental ones. 
Secondly, possibility theory is applied to quantify the consistency between model and measurements. Finally, the 
biomass growth rate is excluded from the datasets and its prediction used to exemplify the capability of the 
model to calculate non-measured fluxes. 

Conclusions: This contribution shows how a small-sized network can be assessed following a rational, quantitative 
procedure even when measurements are scarce and imprecise. This approach is particularly useful in lacking data 
scenarios. 



Background 

The collection of biochemical reactions involved in the 
metabolism of a cell can be assembled in networks in 
order to carry out studies under a system-level approach 
[1]. Such analysis have been done with large, even gen- 
ome-scale, reconstructions of well-characterised organ- 
isms such as Escherichia coli, Saccharomyces cerevisiae, 
Pseudomonas putida [2-4], and also with simpler net- 
works that consider only a few key metabolites [5-7]. 

Given a metabolic network, a matrix equation can be 
used in order to describe the mass balances around the 
nodes, the m internal metabolites: 
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in which c is a vector of metabolite concentrations 
and v is the vector of reaction rates, or fluxes, represent- 
ing the mass flow through each of the n reactions in the 
network [8]. 

In order to avoid reaction kinetics, still rarely known, 
the internal metabolites are often assumed not to accu- 
mulate and thus (1) turns into a system of linear equa- 
tions. Then, other constraints can be imposed; for 
instance, it is common to consider particular enzyme 
kinetics [9], thermodynamics [2,10], or the irreversibility 
of certain reactions using inequalities. In this way, a 
constraint-based model can be assembled [11,12]. 

By combination of this model and a set of measurable 
fluxes, the remaining ones can be estimated performing 
a metabolic flux analysis (MFA) [13]. It is even possible 
to incorporate intracellular measurements from stable 
isotope tracer experiments to apply 13C-MFA [14,15]. 
Unfortunately, these data are not available in most 
cases. Indeed, scarcity of measurements often results in 
practice in underdetermined systems, and therefore tra- 
ditional MFA cannot be performed. In this context, a 
constraint-based approach that attempts to provide a 
range of candidate flux states instead of predicting the 
actual one with precision [11,16] can be of use. In any 
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case, MFA can only be performed using reasonably 
small networks with favourable structures: otherwise its 
under-determinacy can be neither removed, even when 
tracer experiments are available, nor reduced enough to 
get valuable estimates with a constraint-based approach. 

Besides, these medium-sized networks are derived 
from the known biochemical reactions involved in the 
metabolism of a cell, and rely necessarily on reductionist 
hypothesis, being their validation often insufficient. They 
are seldom validated against datasets different from the 
one of interest, which is thus inconveniently used both 
to validate the model and to perform the MFA analysis. 
Herein we discuss a procedure seeking for further vali- 
dation of these networks. 

The methylotrophic yeast Pichia pastoris is world- 
wide recognized as a reference platform for the expres- 
sion of recombinant proteins in eukaryotes, due to the 
possibility to grow cultures to very high cell densities, 
its ability to produce post-translational modifications, 
and the good protein yield/cost ratio. Heterologous 
genes are cloned under P. pastoris strong and tightly 
regulated alcohol oxidase promoter, and thus expressed 
when the cells grow on methanol as sole or combined 
carbon source. 

The optimization of recombinant protein expression 
in P. pastoris has been usually addressed heuristically. 
Only a few publications [17-19] describe rational, 
model-based optimisation and control of Pichia growth 
and protein production. Among these, semi-structured, 
metabolism-based models representing intracellular 
behaviour are particularly rare [20,21]. 

In the following sections, a constraint-based model of 
P. pastoris will be described and validated against the 
available experimental data. Then, its ability to predict 
non-measured fluxes will be illustrated by estimating the 
biomass growth rate. The potential use of the model for 
the estimation of intracellular fluxes will also be dis- 
cussed. In summary, this work applies a systematic, yet 
simple, procedure to provide further validation for a 
small-sized model of P. pastoris, using only data from 
extracellular measurements. 

Methods 

Constraint-based model 

A constraint-based model, assuming that internal meta- 
bolites are at steady-state and considering the irreversi- 
bility of some reactions, can be described with a set of 
model constraints {MOC ) as follows: 

MOC =\ (2) 
| D«v > 0 

Where v is the vector of reaction rates, or fluxes, 
representing the mass flow through each of the n 



reactions in the network, N is the stoichiometric matrix, 
and D is a diagonal matrix with Dj, = 1 if the flux i is 
irreversible (otherwise 0). 

The constraints in (2) define a space of feasible 
steady-state flux distributions, or flux states, which ide- 
ally comprises every theoretically possible phenotype: 
only flux vectors v that fulfill (2) are considered valid 
cellular states. 

Consistency analysis 

The simplest consistency analysis could be performed 
checking that the flux states shown by cells fulfill the 
constraints imposed by the model. However, this simple 
approach would be impractical because measurements 
are imprecise and do not exactly satisfy the constraints. 
Such difficulty is overcome by taking into account 
uncertainty, as follows: 

w m =v m +e m (3) 

where e m represents the error or deviation between 
the actual fluxes v m and the measured values w m . 

Model and measurements can be consistent if there is 
a vector v fulfilling (2) and (3) for "reasonably small" 
deviations e m . Otherwise, we will conclude that model 
and measurements are inconsistent. An easy way to 
evaluate consistency is to find the flux vector v fulfilling 
(2) and (3) that minimises the (variance-weighted) sum 
of errors: 

min ® = e*F _1 e m s.t. MOC (4) 

Where it is assumed that e m are distributed normally 
with zero mean value and have a variance-covariance 
matrix F. If only linear equality constraints are consid- 
ered in MOC) the residual cp is a stochastic variable 
following a % 2 -distribution, and therefore a % 2 -test can 
be used to detect and evaluate the inconsistency. The 
% 2 -test is based upon statistical hypothesis testing to 
determine if the deviation is within expected experi- 
mental error [8]. However, we want to consider inequal- 
ity constraints in (2), and therefore the % 2 -test cannot 
be used because its assumptions are not fulfilled (cp 
does not follows a % 2 -distribution anymore). Yet, the 
residual (p provides at least a rough indication of 
consistency. 

Consistency analysis: Possibilistic MFA 

The consistency analysis can also be formulated as a 
possibilistic constraint satisfaction problem, as it has 
been recently proposed in [16]. The basic idea is that a 
flux vector fulfilling the model constraints (2) and com- 
patible with the measurements will be considered "pos- 
sible", otherwise "impossible". This can be refined to 
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cope with measurements errors by introducing the 
notion of "degree of possibility". 

We introduce a set of measurements constraints 
(MSC ) considering imprecision, as in (3), but substi- 
tuting e m by two pairs of non-negative decision variables 
(non-negative variables are chosen to formulate the cal- 
culations as linear programming problems [16]): 



MSC = 



m T0 1 -^+£ 2 - H 2 

e 1 ,n 1 > 0 

0 < £ 2 < £ 2 maX 

o< M2 <M 2 max 



(5) 



These decision variables £i, fi lt s 2 and fi 2 relax the 
basic assertion w m = v m , conforming a set of possibility 
distributions in (w m , v m ) associated to some cost index 
J. Among different possible choices, a simple -yet sensi- 
ble- one is the linear cost index: 



] = ae l 



(6) 



with a>0 and ft>0, which are row vectors of measure- 
ment reliability coefficients. 

The possibility tt of each solution 8 of (2) and (5), 
which corresponds to a particular flux vector v, is given 
by the value of the cost index: 



tt((S) = e~ ,(5) SeMSCnMOC 



(7) 



The interpretation of (5-7) may be: "w m = v m is fully 
possible; the more w m differs from v m , the less possible 
such situation is". See the article for further technical 
details [16]. 

Defining two pairs of decision variables, there is more 
flexibility to represent the measurements in possibilistic 
terms: the user can assign the bounds B 2 max and \x 2 max 
and the weights a and fi. This way, each measurement is 
represented by a distribution of possibility (see examples 
in [16]). The bounds e 2 max and \i 2 max define an interval 
of fully possible values (possibility n = 1). For instance, 
the user can choose a band of 10% around the measured 
value. The values a and fi define the decreasing possibi- 
lity to assign to values out of this interval (details below). 

At this point, the maximum possibility (minimum- 
cost) flux vector v mp corresponding to a given set of 
measurements is obtained solving a linear programming 
(LP) problem: 



\MOC 
min J s.m 
e.p,v [MSC 



(8) 



The possibility of the most possible solution being, 

[max 

tmp = ^( v mp) = e • 



This degree of possibility provides an indication of the 
consistency between model ( M OC ) and measurements 
(MSC ): a possibility equal to one must be interpreted 
as complete agreement between the model and the ori- 
ginal measurements; lower values of possibility imply 
that certain error in the measurements is needed to find 
a flux vector fulfilling the model constraints. 

Possibilistic estimation of non-measured fluxes 

Possibilistic MFA also enables estimating the metabolic 
fluxes based on the model and the available measure- 
ments. The simplest point-wise estimate is the mini- 
mum-cost flux vector resulting from (7), which contains 
the most possible value for each flux. However, a point- 
wise estimate is limited when multiple combinations 
might be reasonably possible. In this situation, a possibi- 
listic interval estimate is a better choice. 

The interval of values with conditional possibility 
higher than for a given variable, [v™,v^], can be 
computed solving two LP problems, 

[ MOCr>MSC 
vf. = mm v { s.t. \ (9) 
s.v.v U-log?r(v m )<-log7 

The upper bound v^, would be obtained by repla- 
cing minimum by maximum. Possibilistic intervals have 
a similar interpretation to "confidence intervals" ("cred- 
ible intervals") in Bayesian statistics, and provide concise 
but rich flux estimates. Please refer to the above-men- 
tioned article for details on the possibilistic framework 
[16]. 

Results and Discussion 

Metabolic Network of P. pastoris 

The metabolic network presented in Figure 1 is based 
on the stoichiometric model defined in [22] for P. pas- 
toris growth on glucose, which has been extended with 
reactions representing methanol and glycerol metabo- 
lism. This is a simplified representation whose objective 
is not to accurately describe the full biochemistry of the 
yeast but to generate a model in which to apply meth- 
odologies of interest aimed to process analysis, monitor- 
ing and control. 

The main catabolic pathways of the yeast P. pastoris 
(Embden-Meyerhoff-Parnas pathway, citric acid cycle, 
pentose phosphate and fermentative pathways) are 
represented for growth on the substrates mainly used 
for its culture: glucose, glycerol and methanol. In this 
case, a mean biomass equation derived from the macro- 
molecular composition of the yeast is used to summar- 
ize the anabolic pathways according to [22]. Key 
metabolites such as NAD, NADP, AcCoA, oxalacetate 
and pyruvate are considered in distinct cytosolic and 
mitochondrial pools. Several alternative biomass 
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Figure 1 Metabolic network of P. pastoris Simplified representation of central carbon metabolism of the yeast during growth on glucose, 
glycerol and methanol. A supplementary reaction represents biomass formation from selected metabolites (see Additional File 1). 



equations corresponding to Saccharomyces cerevisiae 
models coming from the literature [4,23,24] were also 
tested (data not shown) as detailed in the following sec- 
tions, and found to provide similar results. However, it 
would be useful to evaluate the sensitivity with particu- 
larized P. pastoris biomass compositions, if available. 

The model contains 45 compounds and 44 metabolic 
reactions. The balanced growth condition can be applied 
to 36 internal metabolites, resulting in a 36 x 44 stoichio- 
metric matrix with 8 degrees of freedom (the matrix and 
the list of reactions is given in the additional file 1). As in 
[22], irreversibility is assumed for all reactions except for 
{2-8; 15; 22-27; 29; 34}, and reaction 41 in order to 
account for glycerol uptake, resulting in the constraint- 
based model of the form (1), which is used hereinafter. 



Elementary mode analysis 

Elementary mode analysis provides a way to systemati- 
cally identify a set of relevant pathways of a metabolic 
network [25-27]. The elementary modes (EM) are the 
simplest (steady-state) flux distribution that cells can 
show, whereas the remaining feasible states can be seen 
as its aggregated action (without cancelations of reversi- 
ble fluxes). Moreover, the fact that they comprise all the 
simple pathways in the network, the functional states or 
non-decomposable vectors, makes it possible to investi- 
gate the infinite behaviours that cells can show by sim- 
ply inspecting them. They have been used, for instance, 
to analyse pathways considering optimality [25,28], 
determine minimal medium requirements [12], and 
infer viability of mutants [29]. 
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The 98 elementary modes for the described network 
were obtained using Metatool [30]. They are given in 
the additional file 2. The set of EMs can be classified as 
shown in Figure 2 depending first on its ability to pro- 
duce biomass, and second on the carbon source used: 
glucose, methanol or glycerol. There are 17 EMs that do 
not result in biomass production, whereas 9 generate 
ethanol. No ethanol is produced in single substrate EMs 
when growing. 

The carbon yields for biomass obtained for each EM 
as shown in Table 1. The maximum yield is 4.93 Cmol 
dcw/Cmol in presence of glucose. Glucose is the most 
efficient substrate for growth also in combination with 
glycerol or methanol. 

Methanol is the worst biomass yielding substrate. This 
is also illustrated in Figure 3. In the following sections 
11 different datasets compiled from the literature (Table 
2) are used to determine whether the simplified model 
described above is coherent with experimental data. 

Validation: experimental and theoretical yields 

As a first validation, we checked that the experimental 
growth yields did not exceed the maximum theoretical 
ones given by the model (which were obtained by 
inspection of the elementary modes on each category). 
For instance, the theoretical yield for growth on glucose 
is 4.93, whereas the experimental one is 3.98 (Cmmol 
DW/mmol). The maximum yield on glycerol and 
methanol is 2.25, and the experimental ones at different 
ratios of glycerol and methanol range between 1.31 and 
0.63. It also seems that the experimental yields decrease 
for combinations of substrates with lower theoretical 
yields. 

Thus, no experimental yield violates the maximum 
theoretical ones (the contrary would indicate errors in 
the model because theoretical yields were obtained from 
it). However, the experimental yields tend to be lower 
than theoretical ones. There are several reasons for this 
deviation: (a) the model does not consider restrictions 
on energy cofactors, such as ATP, nor the resources 
devoted to recombinant protein production, (b) the EM 
analysis does not take into account the ratio between 
the different substrates in mixed cases, and (c) even if 
optimal pathways exist, the actual behaviour of cells 
does not necessarily makes use of them in terms of 
growth [25]. 

Validation: model and data consistency analysis 

The datasets in Table 2 were also used to check that the 
experimental measurements, which reflect the metabolic 
state of cells, are feasible states according to the model. 
Two different analysis of consistency were performed: 
one based on minimized, variance-weighted sum of 
squared residuals (cp) and another one based on the 



A EMS: GLC + GOL, GLC + MET, GOL + MET => growth 




02 GLC C02 ETH GOL CIT PYR MET BIO 



B EMS: GLC or GLY or Methanol -> growth 
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C EMS: GLC + GLY + Methanol => growth 





02 GLC C02 ETH GOL CIT PYR MET BIO 
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Figure 2 Elementary modes of the network of P. pastoris 

Macroscopic equivalents of the corresponding elementary modes. 
Blue denotes substances being consumed by the EM, and red those 
being produced (the darker, the higher stoichiometric coefficient). 
Arrows highlighted those EMs with the maximum theoretical yield 
(in terms of growth) for each type. 
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possibility of the most possible flux state or vector (n). 
Both were described in the methods section. The possi- 
bilistic approach is preferred in this case because the 
analysis of least squares residuals has limitations due to 
the presence of inequality constraints in the model. 
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Figure 3 Biomass yields for the Elementary Modes Panel A 
represents in each axis single substrate consumption for biomass 
growth. Most efficient modes are located nearer to origin. Panel B 
details frontal projection for growth on glycerol and methanol. The 
most profitable EM are glucose-consuming. 



In all weighted least squares problems, a standard 
deviation of 10% is assigned to each measurement of the 
set trying to capture their uncertainty. The variance-cov- 
ariance matrix F in (4) is defined accordingly. 

In the Possibilistic MFA problems, the uncertainty of 
the measurements was represented as follows: 

(a) Full possibility (tt = 1) is assigned to values near 
the measured ones, less than ± 5% deviation, to 
account for random errors. 

(b) A decreasing possibility is assigned to larger 
deviations so that values with a deviation equal to ± 
20% have a possibility of tt = 0.1 (those values with a 
deviation of ± 9.5% will have possibility of tt = 0.5). 



This representation is achieved choosing the necessary 
bounds (s 2 max , u 2 mfl *) and weights (a, p) for each mea- 
surement w m . Due to (a), the bounds are defined as 
E2 m " = |i 2 max = 0.05-w m . Then we operate with equa- 
tions (5-7) to achieve (b). From (5) we have that, 0.2-w m 
= ei 20% + s 2 max , and from (6) and (7), log(O.l) = 
-a-e l 20% . As a result we get that, a = -log(0.1)/(0.2-0.05)/ 
w m . Since uncertainty is symmetric, fi = a. 

The results for each dataset are shown in Table 2, 
where the values for cp and Ti(v mp ) are given. The last 
column provides another indicator of consistency: the 
degree of measurements uncertainty needed to find a 
flux vector in full agreement with the model constraints 
(tt = 1). All the computations were performed with 
MATLAB (Math Works Inc., 2003), and YALMIP tool- 
box [31] was used to conduct Possibilistic MFA. 

The consistency between model and experimental 
measurements is very high, but for a small set. In these 
cases, the inconsistency pinpoints especial characteristics 
of these sets of data, as explained below. 

The dataset Dl, which corresponds to Pichia growing 
on glucose, shows very good agreement. The measured 
data has full possibility (tt = 1), meaning that there is a 
flux vector compatible with model and measurements. 
In fact, as shown in the last column, a band of 1% 
around the measured values is sufficient to enclose this 
flux vector. Notice also that the residual is very low. 

Datasets Al and A2, which correspond to cultures 
growing totally or mainly on glycerol and producing a 
small amount of protein, also show a good agreement. 
The discrepancy between measurements and model is 
larger for A3 (n = 0.25), but still a band of 10% of devia- 
tion around measurements encloses a flux vector com- 
patible with the model. Dataset A3 corresponds to a 
culture growing mainly on methanol, but supplemented 
on glycerol, and producing larger amounts of protein. 
The discrepancy is larger for A4, which corresponds to 
a scenario with high protein productivity. 
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Table 2 Experimental data and model consistency 
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*AII the datasets correspond to continuous fermentations in defined chemical media. Further detail can be found in D: Dragosits et al. [22]; A: Sola et al. [21]; B: 
Sola et al. [21]; C: Jungo et al. [19]. Citrate and Pyruvate are assumed not to be produced nor consumed except for dataset D1 in which citrate is consumed at 
0.007 Cmol-kg ' h" 1 . 

"Minimized sum of squared residuals (ip), possibility of the most possible flux vector (P) and degree of measurements uncertainty to P = 1. 



Similar results are obtained with cultures at a higher 
growth rate (datasets B1-B3), Bl and B2 are highly con- 
sistent, while protein producing B3 shows similar beha- 
viour to A3-A4. This suggests the existence of non- 
modelled phenomena, probably related with protein pro- 
duction. The agreement is quite good for the three data- 
sets C1-C3, but the increase of the discrepancy along 
with higher protein expression is also noticeable. 

Finally, we used two batteries of random datasets to 
assess whether the model is indeed able to reject flux 
distribution that do not correspond to actual states of P. 
pastoris cultures. These datasets were defined taking 
random combinations of values for each flux within pre- 
defined bounds (see Table 2). Most of these random 
scenarios were highly inconsistent with the model (pos- 
sibilities lower than 0.1 in 99% and 95% of the datasets, 
for each battery). 

In summary, the constraint-based model shows accep- 
table agreement with the experimental data reported by 
different groups for P. pastoris cultures, and at the same 
time, rejects artificially generated invalid datasets. The 
scenarios with lower agreement pinpoint unmodelled 
phenomena, possibly related to protein expression. 

Using the model to predict growth 

Possibilistic MFA can now be applied to the constraint 
based model and the available measurements in order to 
estimate the biomass growth rate for each of the pre- 
vious datasets. Details of this estimation can be found in 
the methods section. PMFA is applied to the datasets 
shown above excluding the measured value of the 



growth rate (which is used to validate the estimation). 
Results are depicted in Figure 4. 

The estimated growth rate is found to be in very good 
agreement with the measured one for the vast majority 
of the analysed scenarios (Dl, Al, A3, A4, Bl, B2, B3, 
CI and C2), which correspond to cultures at different 
growth rates, using different substrates, and coming 
from three independent literature references. For two 
other scenarios (A2 and C3), the most possible estimate 
is still accurate. 

The fact that, although limited, the model has predic- 
tive capacity provides further validation for this con- 
straint-based representation. This conclusion is 
strengthened if we consider that the growth rate is 
highly interconnected along the whole network, since 
the biomass equation takes into account several meta- 
bolic precursors, and thus accurate correspondence 
between substrate uptake, respiratory fluxes and growth 
cannot be inferred in a straight-forward way from the 
network. 

Using the model to estimate the whole flux distribution 

Once the model has been validated, possibilistic MFA 
could be used to estimate all the non-measured fluxes, 
either intracellular or extracellular, as done with the 
growth rate in the previous section. For illustration pur- 
pose, the flux distributions for each scenario are given 
in the additional file 3. 

Notice that these estimations cannot be done by 
means of traditional MFA because the measurements 
would be insufficient to get a determined system. 
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Dataset 

Figure 4 Prediction of growth rate for P. pastoris cultures using Possibilistic MFA Crosses denote the measured values and circles most 
possible estimates for each dataset. The intervals of possibilities of 0.8 (box), 0.5 (bar) and 0.1 (line) are also depicted. Biomass specific growth 
rate is estimated as biomass efflux, expressed in Cmol-kg'^h^units taking into account the equivalent molecular weight of biomass provided in 
[19,21,22]. 



The network has 8 degrees of freedom (44 fluxes and 
36 linear equations) and there are 9 measured fluxes. 
However, these measurements introduce only 7 inde- 
pendent additional linear constraints, so the system 
remains under-determined with 1 degree of freedom 
[32]. Possibilistic MFA is able to get an estimate thanks 
to the irreversibility constraints (other approaches con- 
sidering these could also provide an estimate). Possibilis- 
tic estimates of fluxes of particular interest are also 
useful to perform a comparative analysis between the 
different scenarios and datasets. For instance, the esti- 
mates for three relevant groups of fluxes, which repre- 
sent splitting nodes within the network, are depicted in 
Figure 5: 

- Fluxes v 2 , v 3 and v 4 belonging to the glycolysis 
pathway, are positive as expected in cultures grown 
in glucose, and appear inverted in glycerol and/or 
methanol fed cultures. 

- Fluxes v 21 , v 22 and v 23 represent the isomerization 
of R5P into Ru5P and Xu5P. Note how v 23 inverts 
its direction at growing methanol fluxes, as increased 
methanol consumption demands higher amounts of 
Xu5P thus requiring more R5P precursor. 

- Fluxes v 32 , v 33 and v 34 represent the branchpoint 
related to methanol usage, that is, how this flux is 
split between direct oxidation and catabolic path- 
ways. High methanol fluxes are necessarily 



conducted via C0 2 generation and thus flux v 34 
becomes distinct from zero in A4, B4, C2 and C3 
scenarios. 

In this way, these results further validate the predictive 
capability of the model. 

Conclusions 

The consistency of a constraint-based model of Pichia 
pastoris has been validated in several experimental 
scenarios resulting in good agreement between estima- 
tions and measurements. In addition, the predictive 
capacity of the model for cell growth rate, an attrac- 
tive target for industrial fermentation monitoring and 
control, has been verified. Interestingly, the accuracy 
of predictions worsens for higher protein producing 
scenarios, showing how the model, derived for a wild- 
type strain, is increasingly less precise as wider 
resources are devoted to recombinant protein 
generation. 

It must be highlighted that the model has been strictly 
constructed upon first-principles and sensible hypoth- 
esis. At this point, the model can be curated, extended, 
and its parameters tuned in order to improve the con- 
sistency with the investigated scenarios. Particularly, 
energy requirements, strongly related to protein expres- 
sion, are not yet considered within the model and future 
work will address this issue. 
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Figure 5 Estimations for a set of relevant fluxes in each scenario. Most possible values (circles and squares for measured and non measured 
fluxes, respectively) and intervals of conditional possibilities 0.8, 0.5 and 0.1 are depicted for each flux. 



This contribution shows how a small-sized network 
can in general be assessed following a rational, quantita- 
tive procedure even when measurements are scarce. 
Possibilistic MFA becomes a useful tool to systematize 
this procedure. This approach enables validation consid- 
ering the stoichiometric balances and also reactions 
reversibilities, and accounting for measurements 



imprecision. The use of Possibilistic MFA also makes it 
possible to predict non-measured fluxes without remov- 
ing the network under-determinancy. There is, however, 
a challenge when validating networks with higher num- 
ber of degrees of freedom because there may be many 
flux vectors compatible with the (few) available mea- 
surements. It is expected that the datasets will be highly 
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consistent, so the approach in this case would be to 
check if the model rejects the artificially generated inva- 
lid datasets. 

When a validated model is available, ideally incorpor- 
ating measurements for some intracellular fluxes, the 
kind of comparative analysis proposed herein will pro- 
vide a insight on how the internal state of the cells 
determines its external behavior, and potentially lead 
intervention within cells, suggesting target metabolites 
or biochemical branch-points and also allowing optimi- 
zation through manipulation of extracellular variables, 
such as feeding strategies and substrate selection. 

Additional material 



Additional file 1: Metabolic network for P. pastoris. This includes the 
list of reactions, metabolites and stoichiometric matrix. 

Additional file 2: Elementary mode analysis. This file includes the 
whole set of elementary modes, the corresponding macroreactions and 
the calculation of the theoretical yields. 

Additional file 3: Complete flux distribution per scenario This file 
includes the figures representing the estimation of each intracellular flux 
for all datasets. 
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